
APP ID= 10064325 



Page 236 of 287 



How Rings assign address space 

^Stepl: ali^ncocnipg address to self (to some ponder of 2) 
Step2: asagilfaetesilttoself acldess 

StepS: neadjwadr = sdfaddr + self jaddr_spQce; //nunber of re^ster usedlocally 
Step4: send cbwnn£xt_addr 



Ehmzeeds 16addB 
Tmer needs 2S6 



Addr:=8 



self =32 



Adar=36 self- 256 
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clkl 



H>H>- 



c]k2 



If the delay between "clkl" and "clk2" 
greater then the delay from Q to d of 
second flipflop, we have a race on our 
meaning right hand flipflop will 
sample the data of Q a whole clock period 
early. 



.14 



12.^ 



1^ 



compound A compound B 



clock nins with data 

the problem is possible race. 
However, we control the logic on 
each flipflop leaving the compound, 
because it is always the same standart 
ring-interface module, we can ensure, 
that the delay will be at least enough. 
And more importantly easily checked 
after layout. 
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1^ 



compound i 



data_a 



compound U 



ciQck Qppgsgg data 

this arrangement has the advantage 
of auto ensuring the no race 
condition (at least in this simple 



data_bcase) exists 




data^a which changes after clkb, 
which is later then clka, is sampled 
by cIka.NORACE. 



7 



clka 



r 

clkb 



— 
1? 



OK \ 




compound B 



compound A 
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ff y'*^ latch, 




1^ 



Qb 



ff 



clka 



2- 



data (\% 



clock 



clock; 




clka 



clkb 



data_a 




clock 



nceitainty range 



data_a leaving the bridge goes to member "b" and 
there should be sampled by rising of clkb. clkb 
lags a lot behind cLka of the bridge. As clearly seen 
from the waveforms, race is eminent. Here we 
should add latches for all the data lines ("90). 
Adding latch works however if the delay between 
clka and clkb is less then 75% of cycle time, 
otherwise the uncertainty kills the usable time. It 
sets hard limit on the number of ring members. 
Also keep in mind that latches needed on each OK 
signal between members of the ring 
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data 




clock 




ncertainty range 



Here, data_b leaves member "b** to be sampled by 
clka in the bridge. But now clkb lags a lot behind 
clka. This actually works to our advantage. If the 
lag is smaller then better part of clock cycle. This 
solution looks better, because between adjacent 
members, we can take care to delay the datas 
beyond danger zone of clock delay, the OK signals 
are covered automatically, and last leg data is also 
covered. The only signal not safe is the OK from 
bridge to "b" member. It will need a latch in "b". 



big module 



F'9 



local.clock'^''^ 

local_data_out 



data from 
previous xnemDer 

elk 




ring interface 



local clock lags behind 
ring_interface clock of this 
module, because we presume the 
module is big. for data_coming 
out, it is not a problem, it changes 
later then ring-i/f flipflops clock. 
However for data entering the 
module from previous member, 
the race is a possibility we must 
look into. 
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if module "a" sends a message to module "b". ring works 
fine. However if most of the traffic is from "c" to "b'*, 
this is more expensive in terms of latency. 



Another problem is "peak latency'*. Suppose that , "a" 
transmits mostly to "d" and "b" mostly to * V* In this case 
communication between "b" and "c" suffers degradation 
in case that peak traffic coincide. 
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Land bridgQ gets its name from the fact that it is 
a luxury. It spans across connected modules. 
The idea is simple. When V2 sends message to 
Dl it gets to one side of the bridge. This side 
analyzes the destination address and by some 
magic (explained later) decides to short-cut the 
path. The message re-appears at the other end of 
the bridge and gets fast to D 1 . By same magic» 
message fromVl to D2 get bypassed also, 
message fromVl to D I is treated directly. 




Enumeration is started by "Anchor** 
which assigns address=l to itself, results 
of enumeration are labels 1 to 7. land 
bridge gets two addresses , as if it were not 
one module, there is "near** end, that got 
enumeration label "3". and the **far" end 
marked 6. 
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Bridge takes responcibility for strays, 
but only at the "far*' end. During 
enumeration, bridge is "polarized" to 
have near and far end. Near is the end 
first struck by enumeration message. 



So we have exactly one enforcer for each 
ring. 



3 near 




II far 



In land bridge ring, the situation is trickier. If V2 
send message to address==5. The land bridge 
divert at 1 1/far end. it will re-appear at 3 and 
start cycling forever. 

We have to define an algorithm that will take 
care of all cases. 

Luckily there is a way. 

Land Bridge deals only with messages arriving 
at the far end and being diverted. It marks and 
monitors only those. Messages arriving at near 
end, keep their markmgs. Messages at fdar end 
going through, are left alone. 
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9 type^^' 




berA 






mem 




meml 




ok 






scan_test ^ 






reset ^ 

















elk 



type 



ok 



idle / msgA 



\idJ c 



during the first clock, OK remains active* when type 
is of msgA. It means that on the next clock, 
memberA may send new message. memberA uses 
this ok to send msgB on the next clock. msgH gets 
•stuck for a clock because OK goes inactive. It goes 
inactive because the flfo in memberB is full. One 
clock later, the fifo has a free entry, so OK returns to 
1 and type returns to idle next clock, return to idle 
could also be change to next message, if there was 
one. 



imsg iok 



omsg 



ook 




umsg 



uok 
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member B 

im^ok omsg oolt 




The incoming messages are examined first 
to see if it is supervisor or work/program. 
Work/program messages have address field. 
We check if it is our address. Since we know 
that our address is aligned to our power of 2, 
The address mask (named split mask) 
causes only certain number if upper bits to 
be compared. The lower part of the address 
is passed inside as internal address. The 
upper bits are compared against self-address 
register. This register gets its value during 
enumeration protocol. The lower part of this 
register is always masked,. Hopefully 
synthesis will delete the unused bits 
implementation. 




ours/through 



part of the address 
that enters the member 



APP ID=1 0064325 



Page 250 of 287 



.'l-O O 6 a-y-3 £!. S ■„ O '7 Gi 



Member 



RifJ^* Rif_* Rif^o^* n^odulejd 



RIF 



Address Space » 7 



Activation register 



300 



APP 10=10064325 



Page 251 of 287 



Member 



Rif_I_opUoiis[5:0] 

Rifj_acldr(*:0] 

Rit.l_datalAi[31:0] 



RifJ-.write 
RilLI-.read 



Rif_l^ok Kif_I__clock 



t 



RIF 



* s address^space 



Activation reg;ister 



300 
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Member 



RiLo^type[7:0] 

Rif_o_addr[19:0] 

Rif.o.diital/h[31:01 




BOO 
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Member 



RifJ^* Rif_* Rif_o_* modulejd 



RIF 



Address Space 7 



Activatioii register 



300 



APP_ID=1 0064325 



Page 254 of 287 



^■i w.zi 



JjJT 



?!0 




the second land bridge solves most traffic problems, but 
adds 4 clocks in the overall ring length. This is not a big 
problem because no message should travel the whole 
perimiten 



3S + 




The Utopia interface is 
forced into mode that 
communicates in 
messages, not cells. We 
using the I/O and maybe 
some of the logic. 



3^ 



Application 

Specific 
Accelerators 

CRC 
Encryption 
Table Lookup 
Hashing 



Internal Memory 
Fast, Unified, Multi-port 



Network Processor 



Peripheral Expansion AttT^ 
Enet, AFM. Uart, USB, Serials 



System ^ 
Expansion 
Area 

CPU (PP) 

DMA 
Smart FIFO 
Ext. mem I/F 

3^ 



5SO 
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internal Memory 



Fetch l?nil 

FTU 



^1 



r 



LSU 



Progrjon Sf;i];u<:rtccr 

PSU 



Vobia Core 



3^4 



PB« 53^ RFl^ 



Rcgisler File 



Arilhmetx: 

DALU 



310 



3 



AGl 



DMA Agent 



Agdcil Dus 



31 
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External nwmoiy 
4^ NPtMkclwIn 




ExtBffnsliiiiMUr* 



Internal memory 



Pert^»ral 
Fife 



»treMn data <Mit 



32 regialcr&<>f 
32 bits (Kt task ^ 

p£r tatSik, which 
camrol !;iiJc cxfcuttDO ' 

An lAUir^e to 

Fast irutm<yry a£»M*:£!iird 
by load/stiw*! 
in&imsticins 



l>t>0rbcibi 



Special 



External 
incmwy 



pet task ^mI 
global fcg^kten 



Imuati^ol by 



Big area acc«:jijM*d 
via a DM'\ itiUKTface 
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Rl refister: 



s - xticley bit 
eq - equal/zero 

mb - nsflecbcm of the R/\M multi -reader busy tndicatioA. 



l«<ia7ft5432l098'»*S43 I l 0 



(17TllMkl>0) 



OEfX ~ 



9 


» ■? « 5 


4 » 


2 1 « 






1: 
K 


MASK. ^ 




Sim 




crro 



TRAP \PR 

U91 tnic«~ 3 




i 4 J 2 » 0 
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Frame structure 
of an example 



r27. 



1^ 



cummon 
Uisk data 


losk 

fmgmcnt I 
data 






tasAi fragment 
2 data 


task 

fragmcfil 3 
data 


data of all lcvel2 functions 


tevdin 
data 


icvcii n 

dAiu 


Icvcll D 
data 







size of IcvelO 
(himc jwt is 
diflcrcnt for 
each task 

4 ^i/.coflcvcl2 
: frame part \n 
^ constant 

size ofJc veil 
frame part tst 
different per 
each to&k 
type 
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Q'T'OCL'CiiS 



FETCH DECODE API«ESS EXECUTE WRITE 




T> - Flip nop 



I> -LOB": 
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CRC 



noop 



^lasMn_frame 
Tast in_transfe 



~ crc_stall 
^calciJlate 



memory 



Vohia 



mem_3tall 
mr addnss[l7:l>) 



a: 



Sm r 

v^Dia miiltircatt_busy 



soiin:c_add[i5:0 



memory 
interface! 

iScdata "j | | data outr63:01 

packer """^"""^"l 

M aligner | | g I 



Vobia 

rcqucsi 
cntr>' 



mput 

message 
decoder 



request 
fifo 



^s^ad^3 
data_out countcij 



01 output 
Message 
encoder 



typc^ m 
ejntcr fatc_addf ea;^ * 5 .0 
R:Jnlcrfttcc_datii[3 1 :0J 



type out[7:0|(vizisuCRC ) 
addrffss_outf23:0] 



data_out[63:0] 
(als<UP C RC> 

ok2drivc 



m»mofy data 

|tfatafadd0maU[add1lbataradd21 



mesaaga dau 



byte|l)]"Tbyt«M] bytefaiM^bytaiai [ bytaW 'TbytelS] ^ | byta[gl 



agent command opcode 
(AlDamultlreadi 

vobta entry llllFl IsllLl 
In multireader 

tncrement first snoop laM itinircc addr in sram address of dcstenation number of bytes 
aSlSS;'*™^'P2>ropn <.vft) - - - ~ - lotninsfer 

to|i3> 



573 
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Vobla 



agent [ntdffac< 



mam 
entry 



shadow 
entry 



output 
message 
encoder 



type out[7:0] 
uddress uul[23 :0] 
data5yt[63:0J 



^ reset 



elk 



ok2dnve 



agant ccHnmand 

(AIDsmessagA.! 
(optionte] =0) 
mesaagej 



opcoiJc 


oprions[9:0] 


RA 


MD[4:0] 


RB/immS 



"'gy^ ^; O.opti<»ist5:0] { raw_data[3 1:0] 



raw_addrcs5[23:0] 



t>pcf7:0j 



mcfsagc data 



addre?K_of_dcstx:rnarion mcs<>agc type 



agent command 

(AID=:mes»as<L.*< 
(option{e| »\\ 

maaaago.a^ndarB' 



opcode 




RA 


A1D(4:0J 


RB 



{l.<9Uons[5:0H raw data[63:0] 



z 



raw_addressl23:0] 



message dara 



addrcss_of_dcstcnation mc^isage type 



6Z 



set mask 



doortwlP 



^set mask 2 



n^sct_mas]^3 



Vobla 



I gent tnterf< ce 
stall vobla 



token 
control 



.id[5H)] 



request 
cntncs 

X2 



DMA 
context taliJe 



^9" 



mput 

message 
decoder 



output 
message 
encoder 



type Jn [7:0] 



wriic micrt'acc_addrcs^23:0] 
wrilc_inlcrfacc_dau»l3 1 ;()] 
n^] pas^nddrcs5[7 :0] 

type out[7:0] 
addrcss_out[23 :0] 
tata^>ut[63.0] 



'■A 



,ok2drive 
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ag«nt command upcode 
(AlCMma ag«nl| 




dma request] Ml I 1-f 

entry I — I I — IL 

iruHlify urgent dir autoscndloog dram address 

(OP2> (OP I) set ack addfcss auures* 



sram addres$[23:0] |[(c<iunt[7.Q]} 



tf)PO) (OP3)(OPtO> 



number of bytes 
to transfer or 
address modifycr 



muftireade ' 



Ja?tJn_transtM- 



I4st in 



cafculatec rc 



ultircaiJiataj3J:0] 
eadcr 



register 




input 


fite 




buffer 


J 





tx data mux 



CRC data 



, on dmand 



Vobia 



iqent irtorfa ce 
stail A-obfa 



random 
number 
generate 



xtrC — 

15,10,32 

machine 



checksum 
machine 



bip16 
machine 



elk 



reset 



agent command 
(AIDs^CRC agent) ' 



typc[2:0) 



si;?c[2:0; 



CRC type data si>e 



1^ 



1 opcode 


options[9;0] 


RA 


AID[4:01 1 


1 1 







DATA[63;32J 



DATA[31:0] 



check 



lion overwrite 
msidue 



RESIDUE 
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igent interfa ce 



I — I control f"^! 
I register ^ 



counter 



pre scale?! ^ 



div 



jlk 

reset 



time stamp 
register 



1 opcode 


options[9.0] 


RA 


AID 


RD/inunS | 





tps|9:0] 



5^ 
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agent interface 
V fSkk switcIT^ 



V sct_doOfbcl I IT ask 

V next task vaui 



Vobia 



v_currcnt__iask_k [5:0] 
valid 
0] 

0] 
2:01 
1:0] 



input 

message 
decuder 



v_cunrent_ ^ 

v_ncxt_task_l^ 



task mask 
registcifilc 



request; 
register file 
and counter; 



mask control 




priority logic 


logic 







J- 



jrifiwritc 

, rif I doo rfael l_cs 



,rifj_ addr[5:03 

rnrTd ata[4:01 

Tm set maskO 
"dm set maskl 
idnr.^«t l mask2 



;dm_sct _mahk3 




t nf 1 reset 



rit I clock 



agent commaiK 
(AIDsdoorbell 



agsn 



opcode 



options[9:0] 



RA 



AID14:01 



RB/irom8 



ft- 



ict'clcar clear clear 
global request mask 
(OP2> (OPO (OPO) 



^ 0,0,0 AO,inask_bit_mdcx [2:0] ) write mask 
or 

{0,0,0,0, 1 »ret|_bit_index[2:01 } write request 
or 

{ l,0,0,0.0,count vatuc(2.0]J write counter 
or 

i 0,1. 0,0,0,0,0,0 J write TGMR 
or 

{ M ,0,0,0,<J,0.urgent_valuc} write urgent 



yo 
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Menuwy controller 
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interface 
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input 
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